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ABSTRACT 

All Navy/Marine monthly aircraft accident rates exhibit 
a behavior of marked variability which cannot be attributed 
solely to weather or other natural phenomena. Variable meas- 
ures construed as time dependent were obtained for all major 
accidents between July 1968 and June 1974. Stepwise linear 
multiple regression studies relating the variables to acci- 
dent rate showed pilot age, daylight pilot flight hours for 
the 90 days preceding the accident, the number of night car- 
rier landings in the previous 30 days, and the number of day- 
light carrier landings in the previous 30 days explained 
46.65% of noted accident rate variance. The results corrob- 
orate previously held theories that pilot error is the single 
largest causal factor in aircraft accidents. 
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I. 



INTRODUCTION 



Dollar costs and the number of aviation personnel killed 
each year have often been used to emphasize the importance for 
determining a viable method to reduce Navy and Marine aircraft 
accidents. Research efforts conducted to date have dealt pri- 
marily with attempts to identify and understand the factors 
which were instrumental in causing accidents. Once identi- 
fied, these pertinent factors or variables were to become the 
crux for developing predictive models for accident occurrence. 

Aircraft accidents have been broadly categorized in terms 
of causal factors which were determined from extensive post- 
accident analysis by accident investigation teams. Aircraft 
or ground support equipment failures, weather, pilot or other 
flight personnel error, maintenance induced equipment mal- 
functions, and shortcomings in design have all been listed 
as primary cause for occurrence of major aircraft accidents. 

An accident is designated as a major accident if: 1) loss 

of life is involved; 2) complete loss of an aircraft is in- 
volved; or 3) substantial damage occurs to any aircraft in- 
volved where substantial damage is defined in Appendix A of 
OPNAVINST 3750.6 (Series). Of these, the most common cause 
sited has been pilot error. Brictson, et al . (1969), studied 

aircraft carrier landing accidents spanning the years 1965 
through 1969. Although the study was limited to attack and 
fighter aircraft, the proportion of accidents attributed to 
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pilot error is indicative of the inordinately high aircraft 
and personnel loss rates incurred over all aircraft types. 
Approximately seventy-eight percent of accidents studied were 
linked to pilot error as being the primary causal factor. 

Over eight percent of the accidents were attributed to errors 
committed by other supporting personnel, leaving only thir- 
teen percent to be distributed among weather, aircraft fail- 
ure, equipment failure, and other causes. Brictson noted 
that the preponderance of accidents were of two types, hard 
landings and undershooting the landing area. Hard landings 
were more prevalent during daylight hours and undershots dur- 
ing hours of darkness. As expected, small carriers accounted 
for approximately seventy percent of the total accidents even 
though flight activity was less than on large carriers. 

Studies conducted for the Royal Air Force by Goorney 
(1965) were an attempt to subcategorize pilot error into its 
component parts, more molecular in scope than had previously 
been done. He concluded that lack of current flying experi- 
ence, fatigue, complacency, personal worries, and emotional 
stress directly contributed to pilot error and, if monitored, 
could be used to predict the likelihood of pilot error re- 
lated accidents. He attributed fatigue to pilots having had 
excessive ground duties prior to flying and aircrew compla- 
cency to lengthy flights in relatively simple aircraft. 
Questionnaires distributed among the ninety accident involved 
pilots making up his sample indicated that emotional stress 
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and personal worries were caused by marital, dating, housing, 
financial, and work oriented problems. 

Current flying experience or pilot proficiency has been 
singled out by some analysts as the best predictive measures 
of pilot error accidents. Keller (1961) hypothesized that 
the amount of flight time logged by a pilot during a given 
period was positively correlated with optimal proficiency. 

He intimated that were a pilot to fly the proper amount, he 
would attain a safe, proficient ability as a pilot. Unfortu- 
nately, the hypothesis was not backed with specific guide- 
lines of how to ascertain the necessary hours of flight time 
which would neither favor fatigue nor insufficiency over 
optimal proficiency. 

Collicot, et al. (1972) categorized accident causal fac- 
tors into pilot error, material failure, maintenance error, 
and miscellaneous other causes. In comparing Navy-Marine F-4 
accident rates with Air Force accident rates, the authors 
attributed maintenance error disparities to the fact that 
Air Force F-4 aircraft realized approximately one-tenth of 
the Navy-Marine cannibalization rate to meet sufficient 
operational aircraft requirements. Naval-Marine officer job 
rotational policies were singled out as adversely affecting 
pilot proficiency. The policies result in lower in-type 
flight hours for Naval-Marine aviators than their Air Force 
counterparts enjoy. The authors also compare accident rates 
of single-seat aircraft with those of dual piloted aircraft. 
They note that when operations required of the aircraft are 
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equalized as best possible, the dual piloted aircraft have 
noticeably fewer accidents per ten thousand flight hours than 
do the single-seated versions. Operational F-4 aircraft were 
used to make the comparisons, comparisons limited to instru- 
ment, take-off, and landing phases of flight. Throughout 
the analysis, the authors continually refer to the importance 
of pilot proficiency but now, unlike prior authors, have al- 
luded to the possibility of pilot mental overload being 
critical in the determination of a prime factor in pilot 
error . 

A shift occurred in the emphasis from pilot error induced 
accidents being caused by lack of pilot proficiency to impli- 
cations that the pilot is all too often proficient but may 
be in a state of temporary mental overload. Data extracted 
from National Transportation Safety Board records led 
Kowalsky, et al . (1974) to posit causal factors for the high 

pilot error rate. Pilot error had heretofore inferred that 
practice of pilot procedure was the only method of improving 
pilot proficiency and thus reducing error. Kowalsky and his 
co-researchers applied cluster analysis and pattern recogni- 
tion techniques to their air carrier accident data and found 
that for non-training, non-midair accidents, the single most 
important human causal factor was that pilots were often 
temporarily overloaded and incorrectly evaluated information 
inputted to them during the overload period. Training flight 
accidents were attributed to instructor pilots delaying 
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recovery or corrective actions until conditions became too 
far out of tolerance to properly affect recovery. Midair 
collisions were best explained by pilots attempting to meet 
time schedules or destinations where such outcomes were im- 
probable at best. 

Burin (1974) took a slightly different tack in treating 
the problem of aircraft accidents. Rather than dwell on 
accident investigator assigned causal factors, he constructed 
a measure of risk for the twelve individual aircraft models 
which contributed the highest towards total flight hours and 
accident occurrence in the Navy-Marine inventory. Four areas 
of risk were defined to include take-off, in-flight, transi- 
tion, and landing evolutions. Accidents were assumed to occur 
in accordance with a Poisson process. Using data which cov- 
ered Fiscal Years 1969 through 1973 obtained from the Naval 
Safety Center, he constructed a risk index for each aircraft 
considered. The risk index consisted of the data derived 
risk in each of the four phases of flight multiplied by the 
associated percentage of an average flight spent in each 
phase. Although the model does in fact conform to the actual 
individual accident rates observed, it does little to isolate 
one or more specific factors enabling corrective action to be 
taken to reduce the very accident rate it models. 

A great deal of effort has been expended since the advent 
of the Naval Safety Center in maintaining extensive data 
banks of accident related information. Statistical analysis 
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of available data, much of which can be construed as measures 
of pilot proficiency, should enable predictive mathematical 
models to be constructed. Such an undertaking was attempted 
by Myers (1974) . He hypothesized that measures of pilot 
experience and proficiency, then available in collection 
agency data banks, would suffice to form an adequate founda- 
tion for accident rate analysis. He selected ten variables 
or factors from the Individual Flight Activity Reporting Sys- 
tem (IFARS) data bank to which he later applied statistical 
techniques of principle component analysis and cluster analy- 
sis. The analysis was limited to two groups of fifty pilots 
each. One group was composed of pilots having been involved 
in aircraft accidents, the other of pilots free of any acci- 
dent participation. Results were not as pronounced as was 
desired, however. Small sample size is suspect in having 
surpressed accident predictive results. 

The authors of this writing agree with the basic premise 
promoted by Myers and others. Sufficient data should be cur- 
rently available, from which predictive capability is ex- 
tractable. The variable nature of the monthly accident rate 
suggests underlying factors causal and thus definable in 
their role of accident perpetration. What common factors 
act to cause accident rate fluctuations? If statistical 
analysis can isolate variable measures associated with pilot 
proficiency, aircraft maintenance, flight mission categories, 
or fiscal management which vary directly or inversely with 
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accident rate, then predictive and thus preventative know- 
ledge can assist in suppressing dollar and human life costs 
resulting from aircraft accidents. 
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II. NATURE OF THE PROBLEM 



Past monthly major accident rates have been computed for 
all Navy/Marine aircraft by the Naval Safety Center (NSC) , 
Norfolk, Virginia. The rate is defined as the total number 
of accidents in a given month multiplied by a constant factor 
of ten thousand and then divided by the total monthly hours 
flown. Major accidents, by definition, are characterized by 
extensive aircraft damage, measured in necessary man-hours to 
effect repair if repair is possible, or loss of life. 

Monthly accident rates exhibit a marked variability when 
each calendar month is compared to other months. Some month- 
ly rates, however, seem to be consistently high. January and 
July rates are higher than the yearly average for four of the 
six sequential fiscal years beginning with' 1968. This phe- 
nomena has also been noted in U.S. Air Force accident rates 
as noted by Zeller and Marsh (1973) . Such seasonal trends 
and monthly rate variability cannot be attributed solely to 
weather. The purpose of this paper is to explore accident 
rate dependence on time related variable measures in hopes 
that one or more of these measures can be identified for 
later use in accident rate reduction. 
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III. ANALYTICAL PROCEDURES 



A. DATA SOURCE 

OPNAVINST 3750.6 (Series) delineates the requirements and 
procedures for reporting each aircraft incident or accident 
involving Naval and Marine aircraft. The type and number of 
different reports required for each accident varies, the ex- 
treme case to include seven separate reports with accompany- 
ing photographs of the crash site, terrain and flight path 
sketches, and detailed statements from knowledgeable witnesses 
or experts. The reports are to be forwarded to NSC for in- 
clusion into their master data bank. Accident data currently 
available from NSC spans the period from the early 1960's to 
the present. Approximately eighty separate variable measures 
are available for each accident occurrence. 

B. DATA SELECTION 

The initial step in the conduct of the current accident 
rate analysis was to select appropriate data points or vari- 
able measures. A data point for an accident was considered 
to be any suitable variable measure associated with the acci- 
dent. Suitable data points could have taken the form of 
accident occurrence date, pilot age, or aircraft model. A 
particular data set consisted of data points for a specific 
accident. 

A sufficient number of data sets had to be incorporated 
into the analysis to facilitate viable statistical results. 
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However, the span of time defined by the data sets to be 
analyzed had to be chosen with care. Were a time span chosen 
during which numerous types of aircraft were removed from the 
operational inventories, undesirable effects could have re- 
sulted. Due to the rapid advances which had occurred in air- 
craft technology, data from accidents prior to Fiscal Year 
1969 was not deemed suitable for analysis inclusion with the 
present inventory of aircraft. In the opinion of the authors 
data for accidents which occurred after Fiscal Year 1974 was 
suspected of containing gaps in information due to continu- 
ing investigations by aircraft custodians. The final deci- 
sion was, therefore, to include all major accidents which 
occurred during the Fiscal Years 1969 to 1974, inclusive. 
Two-thousand-one-hundred-ten accidents or data sets available 
within the six-year period selected were all considered suit- 
able for analysis. 

The NSC data bank provided a ready source of numerous 
data points for each accident or data set. Selection of ap- 
propriate data points required that each point be time depend 
ent. Subjective decisions of time dependency were carefully 
made for each data point considered for analysis. Data point 
time dependency and subsequent selection was based on the 
variable descriptions contained in the Manual of Code Classi- 
fication for Navy Aircraft Accident, Incident and Ground 
Accident Reporting (Code Manual) promulgated by NSC. The 
number of different data points selected were not governed 
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solely by the current study but also by the requirement for 
follow-on studies to be conducted by this institution. Con- 
tinuity of successive studies must necessarily be driven by 
the consistency of logic used in formulation of a data selec- 
tion criterion. 

Fiscal funding policies at the squadron level was held 
suspect in contributing towards the above average observed 
accident rates for January and July. Financial data of this 
sort, however, was not available from the NSC data bank. An 
extensive search for suitable data proved to be unsuccessful 
and time constraints precluded further endeavors into this 
field. 

A historical breakdown of individual squadron flight hours 
per month and flight hours by type aircraft were not avail- 
able from NSC. The information was deemed necessary for in- 
depth analysis because of its necessity in calculating acci- 
dent rates by aircraft type. Because of the varied flight 
envelopes in which different aircraft operate, some types 
were assumed to contribute more heavily towards the overall 
accident rate variability than were others. To properly ex- 
plain the variability, then, required an analysis at the air- 
craft type level. NSC found a source of the needed aircraft 
type and squadron flight data in digital tape form from the 
Naval Supply Corps maintained Maintenance Data Collection 
System (MDCS) records kept at Mechanicsburg, Pennsylvania. 
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A total of thirty different data points for each of the 
two-thousand-one-hundred-ten aircraft accidents were re- 
quested from NSC. Table 1 lists the variable measures ob- 
tained from NSC computer data banks. All data was received 
from NSC on eighty column Hollerith cards and was encoded in 
accordance with the Code Manual description. 

Ten data points from each data set of thirty possible 
points were selected for inclusion into the current analysis. 
These variables represent broad causal categories, some of 
which were involved in earlier studies mentioned in the intro 
duction. Table 2 lists the variables selected for the cur- 
rent study. Pilot age and total flight time in the aircraft 
model involved in the reported accident have been considered 
to be measures of pilot experience. Such measures of experi- 
ence should exhibit negative correlation with accident rate 
if they are in fact true measures of experience. Age has 
been considered to be a primary measure of caution by philoso 
pher and insurance corporations alike when the subject of 
risk has been broached. The Navy has and still is measuring 
pilot experience in terms of "seat time" logged in applica- 
ble aircraft models. 

Pilot proficiency, a measure of recent hours flown, has 
also been directly tied to the maintenance or increase of 
flight prowess for Naval Aviators. This factor also complies 
with the age old adage of "practice makes perfect" and should 
be negatively correlated to accident rate. Measures of pilot 
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TABLE 1 



DATA SET REQUESTED FROM NAVAL SAFETY CENTER 



Data concerning the pilot: 

1 . Age 

2. Injuries 

3. Number of previous service tours 

4. Total flying time in aircraft model in which accident 
occurred 

5. Total flight hours in previous ninety days 

6. Total nighttime flight hours in previous ninety days 

7. Total daylight carrier landings in previous thirty 
days 

8. Total night carrier landings in previous thirty days 

9. Number of years as designated Naval Aviator 

Data concerning aircraft: 

1. Model 

2. Damage 

3. Number of tours between major aircraft rework 

4. Type of last major inspection 

5. Hours since last inspection 

6. Identification of the system or component failure 
Data concerning the flight: 

1. Major command 

2. Reporting custodian 

3. Ship's hull number (if applicable) 

4. Marine Air Wing (if applicable) 

5. Location 

6. Flight Purpose Code 

7. Type of operation code 

8. Phase of operation in which the accident occurred 
Data concerning the accident: 

1. Accident identification number including calendar date 

2. Other aircraft damaged 

3. Other personnel injured 

4. Contributing causal factors 

5. Special data not otherwise listed 

6. Weather 

7. Accident rate for the month in which the accident 
occurred 
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TABLE 2 



DATA SET INCLUDED IN CURRENT STUDY 



1. Accident rate by month (RATE) 

2. Pilot's age (AGE) 

3. Total flight time in accident involved aircraft model 
(TTIME) 

4. Total flight time during ninety days preceding accident 
(TOT 90 ) 

5. Total night flight time during preceding ninety days 
(NITE90) 

6. Daylight carrier landings during preceding thirty days 
(CLDAY) 

7. Night carrier landings during preceding thirty days 
(CLNITE) 

8. Number of aircraft tours (ACTOUR) 

9. Aircraft flight hours since last major or minor inspec- 
tion (ACHRS) 

10. Flight Purpose Code 



proficiency selected for inclusion were total flight time in 
the last ninety days and nights, total flight time in the 
last ninety nights, the number of carrier landings in the 
last thirty days, and the number of carrier landings in the 
last thirty nights. Although flight hour data points were 
available for twenty-four and forty-eight hours prior to 
accident occurrence, these were not selected for inclusion 
to the study because of the uncertainty of whether proficien- 
cy or fatigue would be the true factor underlying each of 
these variables. 

Measures of airframe age and general condition were re- 
presented by the number of major overhauls or rework induc- 
tions which the accident aircraft had undergone (Aircraft 
Tours). Each aircraft in the Navy/Marine inventory is 
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required to undergo a Periodic Aircraft Rework (PAR) cycle 
for analysis and repair after a model specific number of 
flight hours have been accumulated. The number of flight 
hours accumulated by the accident aircraft since its last 
major or minor inspection was selected as a variable measure 
of aircraft condition. This variable was also selected as a 
monitor to explain any reliability anomalies other than "new- 
better-than-used" life expectations for critical aircraft 
components as mentioned by Butterworth, et al . (1974). 

The Flight Purpose Code was selected as a data point be- 
cause of its categorization of each flight accident in accord- 
ance with an operational mission type. Earlier studies by 
Kowalsky, et al. strongly suggested that training flights 
were a flight category rife with accident potential. Combat 
flights wherein the aircraft was not lost to enemy fire nor 
damage sustained therefrom have historically been credited 
with a less than average accident rate. Inclusion of basic 
flight purpose codes as data points was meant to clarify 
these prior suppositions as either correct or erroneous. 

C. PRELIMINARY DATA PREPARATION 

Parametric statistical procedures available for determin- 
ing the relationship between variables require the assumption 
of normality. The data must also be in the interval measure- 
ment scale. The raw data used included some measurements in 
the nominal and interval measurement scales. The technique 
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of averaging by month was used to transform the data into 
interval data. In addition, transformation of the data al- 
lowed the assumption of normality by invoking the Central 
Limit Theorem. 

Data points 2 through 9 of Table 2 were adjudged to be 
interval measures. Raw data for each of these variables was 
averaged by month for each of the seventy-two months within 
the total time span selected. Data point 10 was a nominal 
measure in its raw form and, therefore, required transforma- 
tion to an appropriate list of frequencies for each of the 
three major flight types to be considered. The ship's hull 
number was used as a flag in a computer program to determine 
the proportion of accidents which were attributed to carrier 
based or land based aircraft. The Flight Purpose Code con- 
sists of a three character alphanumeric code. The second of 
the three characters specifies the basic mission type and was 
used to determine the proportion, by month, of accidents which 
occurred during training flights, general service flights, and 
combat flights. The result of the raw data transformations 
was the creation of eight new variables, six of which were 
included in subsequent parametric analysis. The new varia- 
bles included in later analysis were percent carrier training 
flights (CVTRNG) , percent carrier service flights (CVSCE) , 
percent carrier combat flights (CVCBAT) , percent land based 
training flights (LTRNG) , percent land based service flights 
(LSVCE) , and percent land based combat flights (LCBAT) . 
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Variables created but not included in the analysis were per- 
cent carrier based flights and percent land based flights. 

Each percentage dealt only with those aircraft involved in a 
major accident. The new variables were adjudged to be in 
compliance with interval measurement scale requirements. 

D. THE ANALYSIS TECHNIQUE 

The stepwise multiple regression computer program package 
developed by Jae-On Kim and Frank J. Kohout at the University 
of Iowa was selected as the means of conducting the statisti- 
cal analysis of the data set. The program is included in 
the Statistical Package for the Social Sciences (SPSS) com- 
piled and edited by Nie, et al. (1975) . Kim and Kohout state 
that stepwise multiple regression is a recognized technique 
to: 1) "find the best linear prediction equation and evaluate 

its prediction accuracy; 2) control for other confounding 
factors in order to evaluate the contributions of a specific 
variable or set of variables; and 3) find structural rela- 
tions and provide explanations for seemingly complex multi- 
variate relationships, such as is done in path analysis." 

The primary purpose, however, is to evaluate and measure over- 
all dependence of a specific variable on a set of other vari- 
ables. The specific variable (dependent variable) used for 
the current study was monthly accident rate and the set of 
other variables (independent variables) consisted of those 
listed as 2 through 10 in Table 2. 



23 



The computer program is designed to provide the user with 
a considerable number of control options. Although the 
majority deal with computer output formats, two of the avail- 
able options can drastically affect the validity of the re- 
gression results. One such option allows the user to include 
all data sets in the computation of correlation coefficients. 
Were each data set (case) complete, the option would be harm- 
less. However, numerous cases contained in the NSC supplied 
data were incomplete. The first three years or thirty-six 
cases lacked three critical pilot proficiency oriented data 
points. Variables 5, 6 and 7 of Table 2 were void of any 
entries for July 1968 through June 1971. Were the option 
invoked, the blank data points would have been evaluated as 
zeros and included into the computation of correlation co- 
efficients. The resultant matrix of correlation coefficients 
would have been grossly biased. 

More insidious but just as damaging would have been the 
use of an option which allowed pairwise deletion of missing 
data to be selected. With this option, a missing value for 
a particular variable causes that case to be eliminated from 
calculations involving that variable only. Such an option 
allows the user to realize maximum sample size if only a few 
missing values appear in his data. However, in the situation 
where numerous values are missing, particularly if one or a 
few variables account for the bulk of the missing data, the 
sample sizes for the individual variables would not be equal 
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or nearly equal resulting in serious computational inaccurac- 
ies . 

Listwise deletion, a more conservative and accurate 
approach, provides the computer with instructions to delete 
any case from all computation if it contains missing variable 
values. The user's sample size is subject to drastic reduc- 
tion in size but computations integral to the stepwise multi- 
ple regression are insured of being accurate. This option 
was selected for all computer runs included in the current 
study. 

The stepwise multiple regression technique (Appendix C) 
is particularly useful in studies of the current type because 
as each independent variable is entered into the regression 
equation, the percentage of the total dependent variable's 
variance yet unexplained by the independent variables already 
in the regression is calculated. The percentage of the acci- 
dent rate variability explained by the time related independ- 
ent variables chosen is exactly the type of statistical output 
required to ascertain the causal factors responsible for that 
variability . 
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IV. RESULTS 



Two separate regression calculations were made. The 
first computer run included only data points for the thirty- 
six month period from July 1971 to June 1974. All variables 
except for percent carrier based flights and percent land 
based aircraft were included. The second computer run en- 
compassed the entire six year span but did not include five 
variables. The percent carrier and land based flights were 
not included nor were the three variables with excessive 
missing data for the period of July 1968 to June 1971. There- 
fore, variables 5, 6 and 7 of Table 2 were not included. 

The correlation coefficients obtained for the thirty-six 
month run are included in Table 3. Regression results indi- 
cated that the hierarchical order of variable inclusion as 
governed by the individual variable contributions towards ex- 
plaining accident rate variance was: 1) Pilot age; 2) Total 

flight time during the previous ninety days; 3) Night carrier 
landings during the preceding thirty days; 4) Daylight car- 
rier landings during the preceding thirty days; 5) Total 
flight time in the accident involved aircraft model; 6) Total 
night flight time during the previous ninety days; 7) Per- 
cent carrier training flights; 8) Percent carrier service 
flights; and 9) Percent land based training flights. Further 
inclusion was inhibited by the program user through imposition 
of an F statistic stopping order for F values of 1.0 or less 
(Appendix D) . 
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TABLE 3 

MATRIX OF SIMPLE CORRELATION COEFFICIENTS 





RATE 


AGE 


TTIME 


TOT 9 0 


NITE90 


RATE 


1.00000 


-0.54792 


0.06517 


0.44988 


0.23907 


AGE 


-0.54792 


1.00000 


0.17418 


-0.29876 


-0.14839 


TTIME 


0.06517 


0.17418 


1.00000 


-0.08233 


-0.47143 


TOT 90 


0.44988 


-0.29876 


-0.08233 


1.00000 


0.40628 


NITE90 


0.23907 


-0.14839 


-0.47143 


0.40628 


1.00000 


CLDAY 


-0.15891 


0.28792 


-0.04861 


0.20754 


0 . 28294 


CLNITE 


-0.25688 


0.12183 


-0.05574 


0.13295 


0.16230 


ACTOUR 


-0.23495 


0.34257 


0.16068 


-0.36285 


-0.45523 


ACHRS 


0.34738 


-0.33283 


0.09483 


0.40544 


0.05556 


CVTRNG 


-0.06672 


-0.05411 


0.00443 


-0.05277 


0.33645 


CVSCE 


0.23746 


-0.13404 


-0.18301 


0.08759 


-0.08029 


C VC BAT 


0.25093 


-0.19304 


-0.15251 


0.55128 


0.28451 


LTRNG 


-0.03787 


-0.16657 


-0.04165 


-0.38813 


-0.29947 


LSVCE 


-0.26996 


0.50471 


0.22386 


-0.13272 


-0.23025 


LCBAT 


0.27673 


-0.26953 


0.12874 


0.39751 


0.13395 




CLDAY 


CLNITE 


ACTOUR 


ACHRS 


CVTRNG 


RATE 


-0.15891 


-0.25688 


-0.23495 


0.34738 


-0.06672 


AGE 


0.28792 


0.12183 


0.34257 


-0.33283 


-0.05411 


TTIME 


-0.04861 


-0.05574 


0.16068 


0.09483 


0.00443 


TOT 90 


0.20754 


0.13295 


-0.36285 


0.40544 


-0.05277 


NITE90 


0.28294 


0.16230 


-0.45523 


0.05556 


0.33645 


CLDAY 


1.00000 


0.79831 


0.14288 


-0.03512 


-0.01193 


CLNITE 


0.79831 


1.00000 


0.00646 


-0.02526 


-0.02047 


ACTOUR 


0.14288 


0.00646 


1.00000 


-0.27071 


-0.17515 


ACHRS 


-0.03512 


-0.02526 


-0.27071 


1.00000 


-0.22963 


CVTRNG 


-0.01193 


-0.02047 


-0.17515 


-0.22963 


1.00000 


CVSVCE 


-0.11558 


-0.19834 


0.09409 


0.20775 


-0.14954 


CVCBAT 


0.16148 


0.08680 


-0.39118 


0.64853 


-0.32414 


LTRNG 


-0 . 22961 


0.02469 


0.21194 


-0.18111 


-0.33950 


LSVCE 


0.16272 


-0.01629 


0.28087 


-0.35206 


-0.19423 


LCBAT 


0.00167 


0.07397 


-0.22269 


0.52835 


-0.07665 




CVSVCE 


CVCBAT 


LTRNG 


LSVCE 


LCBAT 


RATE 


0.23746 


0.25093 


-0 .03787 


-0.26996 


0 . 27673 


AGE 


-0.13404 


-0.19304 


-0.16657 


0.50471 


-0.26953 


TTIME 


-0.18301 


-0.15251 


-0.04165 


0.22386 


0.12874 


TOT90 


0.08759 


0.55128 


-0.38813 


-0.13272 


0.39751 


NITE90 


-0.08029 


0.28451 


-0.29947 


-0.23025 


0.13395 


CLDAY 


-0.11558 


0.16148 


-0.22961 


0.16272 


0.00167 


CLNITE 


-0.19834 


0.08680 


0.02469 


-0.01629 


0.07397 


ACTOUR 


0.09409 


-0.39118 


0.21194 


0.28087 


-0.22269 


ACHRS 


0.20775 


0 . 64853 


-0.18111 


-0.35206 


0.52835 


CVTRNG 


-0.14954 


-0.32414 


-0.33950 


-0.19423 


-0 . 07665 


CVSVCE 


1.00000 


0.03676 


-0.27935 


-0.00165 


-0.03445 


CVCBAT 


0.03676 


1.00000 


-0.24296 


-0.37850 


0 .23707 


LTRNG 


-0.27935 


-0.24296 


1.00000 


-0.41513 


-0.07773 


LSVCE 


-0.00165 


-0.37850 


-0.41513 


1.00000 


-0.23649 


LCBAT 


-0.03445 


0.23707 


-0.07773 


-0.23649 


1.00000 
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The program output provided a listing of the multiple 
correlation coefficients (multiple R) , squared coefficient 
values and simple correlation coefficients (simple R) . Co- 
efficient values for the predictive regression equation were 
also provided in both standardized and non-standardized form, 
BETA and B respectively. Table 4 is a summary listing of 
computer output provided by the SPSS package. 

TABLE 4 

REGRESSION OUTPUT SUMMARY 
1. July 1971 to June 1974 data points 



REGRESSION 

COEFFICIENT 



VARIABLES 


MULTIPLE R 


R SQUARE 


SIMPLE R 


(B) 


BETA 


AGE 


0.54792 


0.30022 


-0.54792 


-0.04457 


-0.43384 


TOT90 


0.62462 


0.39015 


0.44988 


0.00489 


0.27336 


CLNITE 


0.67249 


0.45224 


-0.25688 


-0.03347 


-0.44456 


CLDAY 


0.69143 


0.47807 


-0.15891 


0.00710 


0.26089 


TTIME 


0.71108 


0.50563 


0.06517 


0.00064 


0.37329 


NITE90 


0.72415 


0.52439 


0.23907 


0.02365 


0.35557 


CVTRNG 


0.74319 


0.55234 


-0.06672 


-0.00180 


-0.08958 


CVSVCE 


0.75852 


0.57535 


0.23746 


0.01091 


0.24413 


LTRNG 


0.77339 


0.59813 


-0.03787 


0.00391 


0.22667 



(CONSTANT) 1.10440 

An analysis of variance was conducted for the regression 
to determine the significance for each variable included. The 
null hypothesis for the tests stated that each new variable 
added to the regression did not significantly add to the vari- 
ance for accident rate explained by variables already present 
in the regression. Results of computations outlined in Appen- 
dix D indicated that inclusion of TOT90 with AGE was signifi- 
cant at the 95% confidence level. Entry of CLNITE with AGE 
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and TOT90 was significant at the 90% confidence level, and 
inclusion of CLDAY was significant only at the 75% confidence 
level. Every other variable tested failed the significance 
tests at the 75% confidence level. In equation form the 
regression became: 

1) RATE = 1.1044 - 0.04457 (AGE) + 0.00489 (TOT90) for 
a 95% confidence level. 

2) RATE = 1.1044 - 0.04457 (AGE) + 0.00489 (TOT90) - 
0.03347 (CLNITE) for a 90% confidence level and 

3) RATE = 1.1044 - 0.04457 (AGE) + 0.00489 (TOT90) - 
0.03347 (CLNITE) + 0.0071 (CLDAY) for a 75% confidence 
level . 

Since TOT90 data points included both daylight and night 
flight hours, a second regression was conducted wherein cor- 
rected values for daylight hours only were used. The results 
were different. Order of inclusion into the computation did 
not change but summary output values did change. Table 5 is 
the summary listing of computer output for the regression us- 
ing DAY90 values. 

Analysis of variance testing for determination of the 
significance effect provided by inclusion of each variable 
gave the following results: 1) CLNITE with DAY90 and AGE was 

significant at the 90% confidence level; 2) CLDAY with 
CLNITE, DAY90 and AGE was significant at the 75% confidence 
level; 3) No other variables were significant at the 75% 
level or better. In equation form the regression becomes: 
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TABLE 5 



REGRESSION OUTPUT SUMMARY 
1. July 1971 to June 1974 Data Points 



REGRESSION 

COEFFICIENT 



VARIABLE 


MULTIPLE R 


R SQUARE 


SIMPLE R 


(B) 


BETA 


AGE 


0.54792 


0.30022 


-0.54792 


-0.04457 


-0.43384 


DAY 90 


0.61296 


0.37572 


0.41725 


0.00489 


0 . 25258 


CLNITE 


0.65520 


0.42928 


-0.25688 


-0.03347 


-0.44456 


CLDAY 


0.68300 


0.46648 


-0.15891 


0.00710 


0.26089 


TTIME 


0.69602 


0.48444 


0.06517 


0.00064 


0.37329 


NITE90 


0.72415 


0.52439 


0.23907 


0.02855 


0.42915 


CVTRNG 


0.74319 


0.55234 


-0.06672 


-0.00180 


-0.08958 


CVSVCE 


0.75852 


0.57535 


0.23746 


0.01091 


0.24413 


LTRNG 


0.77339 


0.59813 


-0.03787 


0.00391 


0.22667 


(CONSTANT) 






1.10440 




1) RATE = 1.1044 


- 0.04457 


(AGE) + 0.00489 ( DAY 9 0 ) - 



0.03347 (CLNITE) at the 90% confidence level and 
2) RATE = 1.1044 - 0.04457 (AGE) + 0.00489 ( DAY 9 0 ) - 

0.03347 (CLNITE) + 0.0071 (CLDAY) at the 75% confid- 
ence level. 

The regression completed for the entire seventy-two month 
period of July 1968 to June 1974 wherein variables with large 
numbers of missing values were removed, served to support the 
order of inclusion for AGE and TOT90 found earlier. Because 
CLDAY, CLNITE, and NITE90 data points were not included in 
the input to the program, the regression selected LCBAT as 
the third variable for inclusion. No other variables were 
included. Analysis of variance showed TOT90 with AGE to be 
significant at the 95% confidence level. LCBAT inclusion 
was significant only at the 75% confidence level. 
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The residuals for this study are defined as the devia- 
tion of observed accident rate from the estimate accident 
rate obtained from the appropriate regression equation. The 
residuals are used as a basis for computation of the multiple 
correlation coefficients. Direct examination of the residuals 
also provides information relevant to the linearity and nor- 
mality assumptions necessitated by the multiple linear re- 
gression technique. Regression analysis requires assumptions 
of error component independence, component mean of zero, and 
the same component variance throughout the range of dependent 
variable values. The SPSS package provides a visual plot of 
residuals against the predicted values of the dependent vari- 
able determined by the regression equation. Examination of 
the plot failed to reveal the presence of any abnormalities 
indicative of faulty assumptions for each of the computer 
runs made. 
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V. DISCUSSION 



For each of the regression studies made, the pilot's age 
was most instrumental in explaining accident rate variance. 

AGE explained 30.022% of the variance for each of the thirty- 
six month studies. It accounted for 31.217% of the variance 
for the seventy- two month study. The negative simple corre- 
lation coefficients indicates that as the accident rate in- 
creased, a relatively young age group was involved. Prior 
studies conducted using age as a variable measure have equated 
age to pilot experience. The authors of this study agree that 
age is a measure of experience but not necessarily of pilot 
experience which connotes growing old at the cockpit controls. 
Older people tend to be more rational and prefer risk aver- 
sion to risk taking. The older a person becomes, the less 
impulsive he tends to be. Age also tends to provide a person 
with a larger repertoire of near tragedies from which to draw 
reminders or analogies for current situational positions. 

Daylight flight time accumulated for the ninety day period 
preceding accident involvement, DAY90, accounted for 7.550% of 
the accident rate variance. Prior studies have equated this 
variable to a measure of pilot proficiency, as indeed it logi- 
cally should be. However, the positive simple correlation 
coefficient associated with DAY90 would seem to refute such 
an interpretation. It is unlikely that the more proficient a 
pilot becomes, the more prone he is to accident involvement. 
The ninety day period is too long to attribute large numbers 
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of hours flown to fatigue inducement at the time of the acci- 
dent. Note, however, that the actual distribution of flight 
hours within the ninety day period is not known and, there- 
fore, precludes the authors from discounting the possibility 
of a fatigue factor with any degree of certainty. A more 
likely explanation would tend to follow Goorney's supposition 
that pilot complacency may increase directly as the number of 
hours flown and thus contribute to accident occurrence. No 
definitive explanation exists at this date for the positive 
correlation phenomena noted. 

Both CLNITE and CLDAY simple correlation coefficients are 
negative and would seem to support variable categorization as 
measures of pilot proficiency. CLNITE accounts for 5.356% of 
the accident rate variance while CLDAY is responsible for 
3.720%. Both variables represent the portion of a flight pro- 
file wherein pilot performance is critical. Of particular 
interest is the fact that the variable measure representing 
an almost purely instrument flight situation (CLNITE) is more 
critical in the explanation of accident rate variability than 
is the variable measure often associated with a more VFR 
associated flight situation. Again, risk aversion is more 
likely to occur under night or instrument flight conditions, 
whereas day or visual flight conditions seem to foster risk 
taking . 

Regardless of whether the variable measures included in 
the regression are identified as belonging to pilot 
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proficiency or pilot experience categories, the fact that all 



of them Were pilot oriented should impress the reader. Pilot 
error of one sort or another is listed as the single largest 
cause of aircraft accidents. Results of this study would tend 
to corroborate the listed cause as correct. Analysis indi- 
cates that 46.648% of the total accident rate variability is 
explained by pilot related variable measures, if we are will- 
ing to accept statistical confidence levels of 75% as meaning- 
ful. 
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VI . RECOMMENDATIONS 



The current study includes all aircraft communities in 
its treatment of accident rates. To be so general is to miss 
an opportunity to isolate the true causal factors behind the 
accident rate fluctuations. The all Navy /Marine accident 
rate variability is composed of many individual components 
generated by different aircraft types. Each aircraft type 
should be analyzed separately in an attempt to ascertain 
commonalities between aircraft communities. Those communities 
which seem to contribute little or no inputs should be re- 
moved from the flight hour data necessary to compute accident 
rates. Aircraft types or communities which contribute heavily 
towards the overall accident rate would most likely not show 
rate fluctuations, each of which were in concert with the 
others. They would probably act in concert at times and at 
other times be diametrically opposed. 

Accidents involving carrier based aircraft should be 
separated from those involving land based aircraft. Variable 
measures which deal with each community should be relegated 
to correlation studies with the appropriate carrier or land 
based accident rate. To make analytical studies relating 
carrier training flights with the accident rates derived from 
all Navy/Marine data may bury some possible correlative results 
which could be most informative. To demonstrate the degree 
in which general comparisons can result in loss of valuable 
information, review the regression outputs contained in Table 



35 



4 and Table 5. Removal of night flight hours from the TOT90 
variable measure prompted a sizable decrease in the associated 
multiple correlation coefficient. 

The authors of this study had as their intended goal, the 
analysis of accident rate by type aircraft. Flight hour data 
necessary to compute accident rates by aircraft type, carrier 
versus land communities, or by major commands was not avail- 
able for use within the time constraints imposed. Follow-on 
studies should be encouraged to pursue a more microscopic 
approach to the problem in these areas. Research involving 
variables categorized as measures of pilot error should hold 
the greatest rewards if the results of this initial study are 
used as a guide. Fiscal policy data might also provide de- 
finitive answers, particularly in the cases where availability 
of funds is a key determinant in the tempo of operations. The 
relationship between the tempo of operations and the accident 
rate has been the subject of a study conducted by Robino 
(1972) for the Naval Safety Center. The study was somewhat 
broad in scope, but does tend to encourage a more detailed 
study into this area. 
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APPENDIX A 



AVERAGE MONTHLY DATA POINT VALUES 
FISCAL YEARS 





1969 


1970 


1971 


1972 


1973 


1974 


AVG 


JUL 


1.26 


1.64 


1.65 


. 49 


1.32 


1.00 


1.22 


AUG 


1.35 


1.52 


1.31 


.96 


.96 


.77 


1.14 


SEP 


1.16 


1.18 


1.22 


. 85 


.75 


. 87 


1.01 


OCT 


1.51 


1.15 


1.03 


. 89 


. 63 


.97 


1.03 


NOV 


1 . 36 


1.23 


1.52 


.76 


1.13 


.94 


1.15 


DEC 


1.20 


1.12 


.89 


1.15 


1.10 


. 82 


1.05 


JAN 


1.83 


.94 


.82 


1.09 


.98 


.77 


1.07 


FEB 


1.58 


1.59 


1.61 


.72 


. 61 


.60 


1.12 


MAR 


1.59 


1.82 


.91 


.73 


.91 


.69 


1.11 


APR 


1.27 


.91 


. 93 


1.17 


. 89 


.44 


. 94 


MAY 


1.53 


1.42 


. 58 


. 78 


.90 


.68 


.98 


JUN 


1.30 


1.90 


1.16 


1.34 


.74 


. 44 


1.15 



ALL NAVY/MARINE MAJOR ACCIDENT RATES 

1. RATE = (# ACCIDENTS PER MONTH) 10,000/TOTAL 
FLIGHT HOURS PER MONTH 
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AVERAGE MONTHLY DATA POINT VALUES 
1. July 1968 through June 1971. 



YR/MO AGE TTIME ‘ ; TOT90 NITE90 CLDAY CLNITE ACTOUtT ~ SCHHS 
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AVERAGE MONTHLY DATA POINT VALUES 
1. 't July 1971 through June 1974 . 



YR/MO CVPCT CVTRNG C.VSVC E C VC BAT LPCT LTRNG LSVCE LCBAT 

6807 35.00 10.00 ; ,2.50 22.50 65.00 35.00 17.50 12.50 
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AVERAGE MONTHLY DATA POINT VALUES 

1. 'July 1968 through June 1971. 
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AVERAGE MONTHLY DATA POINT VALUES 
1. '-..July 1971 through June 1974. 



APPENDIX B 



AIRCRAFT ACCIDENTS BY MONTH AND TYPE AIRCRAFT 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1968 through June 1969 
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CTi CN i— I LO 



ACFT 6907 6908 6909 6910 6911 6912 7001 7002 7003 7004 7005 7006 
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A-3 
A- 4 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1969 through June 1970 
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ACFT 7007 7008 7009 7010 7011 7012 7101 7102 7103 7104 7105 7106 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1970 through June 1971 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1971 through June 1972 
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N)U)CT» NJ W H CT» H 
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ACFT 7207 7208 7209 7210 7211 7212 7301 7302 7303 7304 7305 7306 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1972 through June 1973 



TABLE B5 



46 



ACFT 7307 7308 7309 7310 7311 7312 7401 7402 7403 7404 7405 7406 
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ACCIDENT OCCURRENCE BY MONTH AND TYPE AIRCRAFT 
1. July 1973 through June 1974 
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APPENDIX C 



STEPWISE MULTIPLE REGRESSION 

Basic multiple regression is an analytical technique where- 
by a linear approximating equation is sought for a dependent 
variable in terms of two or more independent variables. The 
general mathematical model for multiple regression in its un- 
standardized form is 



where Y' is the regression estimate value of the dependent 
variable, A is the Y intercept constant, b^ represents the 
regression coefficients, and are the independent variables. 

The objective sought in the conduct of multiple regres- 
sion is to find the best linear predictive equation possible 

2 

by insuring that the sum of squared residuals, Z (Y - Y') , 

.. . is. minimized .-..- ,By. minimizing ■ the, .sum .of . squared residuals , ... . ... . 

the regression technique purports to maximize the correlation 
between Y, the observed dependent variable value, and Y', the 
regression estimate. 

The goodness of fit of the regression equation can be 

characterized by the proportion of variance explained for Y. 

2 

To do so, the square of the multiple correlation, R , is used. 
2 

R is calculated by: 



Y ' = A + b 1 X 1 + b 2 X 2 + 




ss 



ss 



ss 



Z (Y' - Y) 
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R 
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res 



reg 



SS 



SS 



Z (Y' - Y) 



2 



Z (Y - Y* ) 



2 



y 



y 
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where SS^ is the total variation or sum of squares in Y, 

SS is' the sum of squared residuals, and SS is the re- 
gression of squares. The numerator for the right hand por- 
tion of the equation represents the variation in Y explained 
by the combined linear influence of the independent variables. 
The denominator is a measure of the total variation in Y. 

The determination of the regression equation stems from 
the simultaneous solution of the standardized normal equa- 
tions 



B 1 + 


B 2 


r l2 


+ 


Vl3 


+ • • • + 


Vlk = r Yl 


Vl2 


+ 


B 2 


+ 


B 3 r 23 


+ ... + 
• 

• 


B k r 2k = r Y2 


Vik 


+ 


V 


2k 


+ B 3 r 3k 


+ . . . + 


B k = r Yk 



where are the standardized regression coefficients of the 
independent variable X^, and r^ are the Pearson or product 



•moiKeh-t 'correlations between variable ' Xv 'and X'. 



i D 

Stepwise inclusion of independent variables available in 
the Statistical Package for the Social Sciences (SPSS version 
6) used for this study allows entry of each variable, one at 
a time. Inclusion criteria are such that the order of varia- 
ble entry is dictated by accountability of each for previous- 
ly unexplained variance. Thus, the variable that explains the 
greatest amount of variance previously unexplained by the 
variables already in the equation is the next to be entered. 
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APPENDIX D 



STATISTICAL TESTS FOR SIGNIFICANCE 



Procedural statistical testing for goodness of fit for the 
regression equation is accomplished by conducting Analysis of 
Variance (ANOVA) . The null hypothesis, H Q , used for the test- 
ing is that the next variable to be added in a stepwise man- 
ner would not add significantly to the explained variance in 
the dependent variable, Y, already accounted for by variables 
included in the regression equation. The alternative hypo- 
thesis, H^, directly contradicts the null hypothesis. An 
equivalent statement of H Q would be that all "k" of the re- 
gression coefficients would be identically equal to zero as 
opposed to the statement of at least one coefficient not 
equating to zero. 

The first step of stepwise regression with forward elimina- 
tion is for eh'ch '6f -the 'independent variables 'to be individu- 
ally regressed against the dependent variable. F statistics 
are computed for each dependent-independent variable pair. 

The pair having the highest F statistic computed is permanent- 
ly selected for inclusion into the regression equation. The 
F statistic used in the first step is 



F 
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I 















is the 



where r , is the simple correlation, It, „ , 

Y1 lmLfZf mm mfK 

multiple correlation for "k" independent variable inclusion, 
and "k" is the number of independent variables entered. N 
is the sample size used in the particular regression under 
consideration . 

For each successive step in determining the regression 
equation, variables in the equation are retained and independ- 
ent variables not in the equation are temporarily entered for 
F statistic computation. At each step, the combination of 
variables resulting in the highest F statistic is kept. The 
general form of the F statistic used for each successive step 
is 



F = 



(incremental SS due to X^) 

SS t . oc / (N-k-1 ) 
res 



' Y (k . 1 , 2 k-l)/l 



(1 ^.1,2, ... ,k 5 / 



(N-k-1) 



where SS is the sum of the squares, is the k fc ^ independent 

variable to be- 'added , SS^^- is tho sum ' of squares of •. the. re>-.-J: 

siduals, and r 0 , .. . is the partial correlation be- 

tween Y and X, when X., i = l,2,...,k-l are held fixed, 
k i 

After all independent variables which contribute to the 
explained variance have been added to the regression equation 
in the order dictated by the amount of variance accounted for, 
the significance of each added variable can be determined 
through use of a separate F statistic. The general form for 
the statistic is 
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F 



R Y. (1,2, . . . ,k-l) */ M 



tR Y. (1,2, k) 

[1_R Y. (1,2, . . . ,k) ] / (N-k-1) 

where "k" is the number of the variable to be tested for in- 
clusion and M is the degree of freedom commensurate with the 
variable (s) tested. 
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